Hello, Jolene! Nice to see you again! :)

My name is Olga. I'm happy to reviewing your project today.

The first time I see a mistake, I' will just point it out and let you find it and fix it yourself. In a real job, your boss will do the same, and I'm trying to prepare you to work as an Data Analyst. But if you can't handle this task yet, I will give you a more accurate hint at the next check.

Below you will find my comments - please do not move, modify or delete them .

You can find my comments in green, yellow or red boxes like this:

Reviewer's comment Success. Everything is done succesfully.
Reviewer's comment Remarks. Some recommendations.
Reviewer's comment Needs fixing. The block requires some corrections. Work can't be accepted with the red comments.

You can answer me by using this:

Student answer.
Overallv reviewer's comment Jolene, thank you for sending your project. You did a great job! I am really impressed! I like your conclusuons and graphs. They are very detailed. Your code is clear, thank you for your work! There are a couple of things that need to be done before your project is complete, but they're pretty straightforward. I believe you can easily fix it! Good luck! :)
Student answer. Hi Olga, Thank you for reviewing my project and for your comments. I want to be a good coder and I appreciate your feedback. I have added the graph you requested. It is a good one to inclue for displaying the event funnel! Thank you, Jolene G.
Overallv reviewer's comment. V.2. Hello Jolene! Thank you for correcting your project! Now your project is a true "A". Congratulations! I added comments with some useful information for your case. Hope, it will be usefully. So, your project has been accepted and you can go to the next sprint. Keep up the good work, and good luck! :)

Integrated Project 2

Project Description

You work at a startup that sells food products. You need to investigate user behavior for the company's app.

Tasks

Important things to consider

Description of the data

Instructions for completing the project

Step 1. Open the data file and read the general information

Step 2. Prepare the data for analysis

Step 3. Study and check the data

Step 4. Study the event funnel

Step 5. Study the results of the experiment

Step 6. Make a general conclusion

Reviewer's comment Great start with detailed introduction!

Import Libraries

Reviewer's comment Thank you for install and import libraries at the beginning.

Step 1. Open the data file and read the general information

Plan of Action

Open and save original data as a dataframe

Examine general info

Take a closer look at the individual columns

Reviewer's comment So,basic inoformation is here.Let's start!

Step 1. Conclusion

Open the data file and read the general information

Actions Performed

Step 2. Prepare the data for analysis

Plan of Action

Read and save working version of data

2.a Rename the columns in a way that's convenient for you

2.b Check for missing values and data types. Correct the data if needed

2.c Check for and process duplicate rows

2c conclusion

2.d Add a date and time column and a separate column for dates

Step 2. Conclusion

Prepare the data for analysis

Actions Performed

Reviewer's comment Excellent preparation! We have done all and ready! :)

Step 3. Study and check the data

3.a How many events are in the logs?

3.b How many users are in the logs?

3.c What's the average number of events per user?

Reviewer's comment Good numbers! Thank you for the details! :)

3.d What period of time does the data cover?

3.d.1 Find the maximum and the minimum date.

3.d.2 Plot a histogram by date and time.

3.d.3 Find the moment at which the data starts to be complete and ignore the earlier section.

3.d.4 What period does the data actually represent?

Reviewer's comment Yes, August the 1st is good variant!
Reviewer's comment Also we can leave last 3 hours of 31 of July. And add a graph for new period. :)
Student answer. Oh! I didnt think of that, thank you!
Reviewer's comment. V.2. At job we need to check this 3 hours. What type of events there? Why? etc :)

3.e Did you lose many events and users when excluding the older data?

3.e Conclusion

Did you lose many events and users when excluding the older data?

Reviewer's comment So, it is possible numbers. Good choice! :)

3.f Make sure you have users from all three experimental groups.

3.f Conclusion

Make sure you have users from all three experimental groups.

Reviewer's comment Groups are normal. Let's continue! :)

Step 3. Conclusion

Study and check the data

After removing duplicates

After removing insufficient data and focusing on events from 2019-08-01 to 2019-08-07

Reviewer's comment Thank you for conclusion after step!

Step 4. Study the event funnel

4.a See what events are in the logs and their frequency of occurrence. Sort them by frequency.

4.a Conclusion

See what events are in the logs and their frequency of occurrence. Sort them by frequency.

Reviewer's comment So, we sort events by frequency. And we know procents. Thay are not so good for us.

4.b Find the number of users who performed each of these actions. Sort the events by the number of users.

4.b Conclusion

Find the number of users who performed each of these actions. Sort the events by the number of users.

Reviewer's comment Unique numbers show better procennts! :)

4.b.1 Calculate the proportion of users who performed the action at least once.

4.b.1 Conclusion

Calculate the proportion of users who performed the action at least once.

Percentage of Number of Actions Performed by Individual Users

Reviewer's comment Very intresting information! Now we can check users actions!

4.c In what order do you think the actions took place. Are all of them part of a single sequence? You don't need to take them into account when calculating the funnel.

4.c Conclusion

In what order do you think the actions took place. Are all of them part of a single sequence? You don't need to take them into account when calculating the funnel.

Reviewer's comment Great conclusion!

4.d Use the event funnel to find the share of users that proceed from each stage to the next. (For instance, for the sequence of events A → B → C, calculate the ratio of users at stage B to the number of users at stage A and the ratio of users at stage C to the number at stage B.)

4.d Conclusion

Use the event funnel to find the share of users that proceed from each stage to the next. (For instance, for the sequence of events A → B → C, calculate the ratio of users at stage B to the number of users at stage A and the ratio of users at stage C to the number at stage B.)

Reviewer's comment Thank you for such wonderful conclusion!
Reviewer's comment Could you plot a graphs for funnel? :)
Student answer. Graph added. This is a very useful way to look at funnel data, thank you!
Reviewer's comment. V.2. Perfect graph with procents! :)
Reviewer's comment. V.2. Also we can plot the same graph for groups. It give us more information. :)

4.e At what stage do you lose the most users?

4.e Conclusion

At what stage do you lose the most users?

Reviewer's comment You are right!

4.f What share of users make the entire journey from their first event to payment?

4.f Conclusion

What share of users make the entire journey from their first event to payment?

Step 4. Conclusion

Study the event funnel

Reviewer's comment Yep!

Step 5. Study the results of the experiment

5.a How many users are there in each group?

Reviewer's comment Numbers are ok. :)

5.a Conclusion

How many users are there in each group?

5.b Determine if there is a statistically significant difference between sample groups

5.b.1 Create a function to find the following for each action in the funnel

Testing the Hypothesis that Proportions Are Equal - Lesson

In stats we learned about hypothesis testing of the means of populations

- made conclusions based on samples comparing their means w/ a certain number or seeing if 2 means are equal to each other

Another typical task is to test hypotheses about the equality of proportions of populations

The difference between the proportions we observe in our samples will be our STATISTIC

Z = (P1 - P2) - (π₁ - π₂) / sqrt(P(1 - P)(1/n1 + 1/n2)) ~ N(0,1)

Z = standard value for a criterion with a standard normal distribution, where the mean is 0 and the standard deviation is 1 the expression is distributed as N(0,1)

n1, n2 = sizes of the two samples being compared ie the number of observations they contain

P1, P2 = proportions observed in the samples

P = P1 + P2

π₁, π₂ = the actual proportions in the populations being compared

With A/B testing, one usually tests the hypothesis that π₁ = π₂. Then, if the null hypothesis is true, the expression (π₁ - π₂) in the nominator will equal 0 and the criterion can be calculated using only the sample data. The statistic thus obtained will be normally distributed, making it possible to carry out two-sided and one-sided (bilateral and unilateral) tests. Using the same null hypothesis that two populations' proportions are equal, we can test the alternative hypotheses that 1) the proportions simply aren't equal, or that 2) one proportion is larger or smaller than the other.

Reviewer's comment Excellent function and preparation! We are ready for tests! :)

5.b.2 Use the function to perform hypothesis tests

5.b.2 A/A test

Control group 246 vs control group 247

5.b.2 A/A test Conclusion

Reviewer's comment It is very good! We haven't problems! :)

5.b.2 A1/B test

Control Group 246 vs Test Group 248

5.b.2 A1/B test Conclusion

Control Group 246 vs Test Group 248 Conclusion

Reviewer's comment Well done! :)

5.b.2 A2/B test

Control Group 247 vs Test Group 248

5.b.2 A2/B test Conclusion

Control Group 247 vs Test Group 248 Conclusion

Reviewer's comment Well done again! :)

5.b.2 A combo/B test

Combined Control Group 249 (246 + 247) vs Test Group 248

5.b.2 A combo/B test Conclusion

Combined Control Group 249 (246 + 247) vs Test Group 248 Conclusion

Reviewer's comment So, it was last and all is clear! :)

5.b Conclusion

Reviewer's comment Yes, fonts does not impact user behavior. :)

5.c What significance level have you set to test the statistical hypotheses mentioned above?

5.c.1 Calculate how many statistical hypothesis tests you carried out.

5.c.2 With a statistical significance level of 0.1, one in 10 results could be false. What should the significance level be?

5.c.2 Conclusion

5.c.3 If you want to change it, run through the previous steps again and check your conclusions.

5.c.3 Conclusion

Reviewer's comment Bomferonni is an excellent choice!

Step 5. Conclusion

extra test to compare the control groups with the bonferonni correction

6 Overall Conclusion

General Information

Sales/Event Funnel

A/A/B test on font styles

General Conclusion

Recommendations

Report conclusion update post completion of the program

Updates to the summary above

Reviewer's comment So, you are right. New fonts are not important. We made a lot of tests and choose good alpha and confirm it by code. Thank you for recommendationd and for sum up all here. Perfect conclusion! :)